Evolutionary computation with noise perturbation and cluster analysis to discover biomarker sets
نویسندگان
چکیده
In biomedical science, data mining techniques have been applied to extract statistically significant and clinically useful information from a given dataset. Finding biomarker gene sets for diseases can aid in understanding disease diagnosis, prognosis and therapy response. Gene expression microarrays have played an important role in such studies and yet, there have also been criticisms in their analysis. Analysis of these datasets presents the high risk of over-fitting (discovering spurious patterns) because of their feature-rich but case-poor nature. This paper describes a GA-SVM hybrid along with Gaussian noise perturbation (with a manual noise gain) to combat over-fitting; determine the strongest signal in the dataset; and discover stable biomarker sets. A colon cancer gene expression microarray dataset is used to show that the strongest signal in the data (optimal noise gain where a modest number of similar candidates emerge) can be found by a binary search. The diversity of candidates (measured by cluster analysis) is reduced by the noise perturbation, indicating some of the patterns are being eliminated (we hope mostly spurious ones). Initial biological validated has been tested and genes have different levels of significance to the candidates; although the discovered biomarker sets should be studied further to ascertain their biological significance and clinical utility. Furthermore, statistical validity displays that the strongest signal in the data is spurious and the discovered biomarker sets should be rejected. © 2010 Published by Elsevier B.V.
منابع مشابه
A New Method for Geolocating of Radiation Sources Based on Evolutionary Computation of TDOA Equations
In this article a new method is introduced for geolocating of signal emitters which is based on evolutionary computation (EC) concept. In the proposed method two well-known members of EC techniques including Bees Algorithm (BA) and Genetic Algorithm (GA), are utilized to estimate the positions of emitters by optimizing the hyperbola equations which have been resulted from Time Difference of Arr...
متن کاملA Hierarchy Topology Design Using a Hybrid Evolutionary Algorithm in Wireless Sensor Networks
Wireless sensor network a powerful network contains many wireless sensors with limited power resource, data processing, and transmission abilities. Wireless sensor capabilities including computational capacity, radio power, and memory capabilities are much limited. Moreover, to design a hierarchy topology, in addition to energy optimization, find an optimum clusters number and best location of ...
متن کاملImproved Niching and Encoding Strategies for Clustering Noisy Data Sets
Clustering is crucial to many applications in pattern recognition, data mining, and machine learning. Evolutionary techniques have been used with success in clustering, but most suffer from several shortcomings. We formulate requirements for efficient encoding, resistance to noise, and ability to discover the number of clusters automatically.
متن کاملDouble Duty: Genetic Algorithms for Organizational Design and Genetic Algorithms Inspired by Organizational Theory
Modularity is widely used in system analysis and design such as complex engineering products and their organization. Also, modularity is the key to solve optimization problems efficiently via problem decomposition. We first discover modularity in a system, and then leverage this knowledge to improve the performance of the system. In this chapter, we tackle both problems with the alliance of org...
متن کاملDevelopment of an evolutionary fuzzy expert system for estimating future behavior of stock price
The stock market has always been an attractive area for researchers since no method has been found yet to predict the stock price behavior precisely. Due to its high rate of uncertainty and volatility, it carries a higher risk than any other investment area, thus the stock price behavior is difficult to simulation. This paper presents a “data mining-based evolutionary fuzzy expert system” (DEFE...
متن کامل